home *** CD-ROM | disk | FTP | other *** search
Wrap
RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) NNNNaaaammmmeeee RWCRExpr - Rogue Wave library class SSSSyyyynnnnooooppppssssiiiissss #include <rw/re.h> RWCRExpr re(".*\.doc"); // Matches filename with suffix ".doc" DDDDeeeessssccccrrrriiiippppttttiiiioooonnnn Class RRRRWWWWCCCCRRRREEEExxxxpppprrrr represents an eeeexxxxtttteeeennnnddddeeeedddd regular expression such as those found in lllleeeexxxx and aaaawwwwkkkk. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RRRRWWWWCCCCSSSSttttrrrriiiinnnngggg. Regular expressions can be of arbitrary size, limited by memory. The extended regular expression features found here are a subset of those found in the POSIX.2 standard (AAAANNNNSSSSIIII////IIIIEEEEEEEEEEEE SSSSttttdddd 1111000000003333....2222,,,, IIIISSSSOOOO////IIIIEEEECCCC 9999999944445555----2222)))).... NNNNooootttteeee:::: RRRRWWWWCCCCRRRREEEExxxxpppprrrr iiiissss aaaavvvvaaaaiiiillllaaaabbbblllleeee oooonnnnllllyyyy iiiiffff yyyyoooouuuurrrr ccccoooommmmppppiiiilllleeeerrrr ssssuuuuppppppppoooorrrrttttssss eeeexxxxcccceeeeppppttttiiiioooonnnn hhhhaaaannnnddddlllliiiinnnngggg aaaannnndddd tttthhhheeee CCCC++++++++ SSSSttttaaaannnnddddaaaarrrrdddd LLLLiiiibbbbrrrraaaarrrryyyy.... The regular expression (RE) is constructed as follows: The following rules determine one-character REs that match a ssssiiiinnnngggglllleeee character: Any character that is not a special character (to be defined) matches itself. A backslash (\\\\ffffRRRR)))) ffffoooolllllllloooowwwweeeedddd bbbbyyyy aaaannnnyyyy ssssppppeeeecccciiiiaaaallll cccchhhhaaaarrrraaaacccctttteeeerrrr mmmmaaaattttcccchhhheeeessss tttthhhheeee lllliiiitttteeeerrrraaaallll cccchhhhaaaarrrraaaacccctttteeeerrrr iiiittttsssseeeellllffff;;;; tttthhhhaaaatttt iiiissss,,,, tttthhhhiiiissss """"eeeessssccccaaaappppeeeessss"""" tttthhhheeee ssssppppeeeecccciiiiaaaallll cccchhhhaaaarrrraaaacccctttteeeerrrr.... TTTThhhheeee """"ssssppppeeeecccciiiiaaaallll cccchhhhaaaarrrraaaacccctttteeeerrrrssss"""" aaaarrrreeee:::: ++++ **** ???? .... [[[[ ]]]] ^^^^ $$$$ (((( )))) {{{{ }}}} |||| \\\\ffffPPPP TTTThhhheeee ppppeeeerrrriiiioooodddd ((((....) matches any character. EEEE....gggg...., "....uuuummmmppppttttyyyy" matches either "HHHHuuuummmmppppttttyyyy" or "DDDDuuuummmmppppttttyyyy...." A set of characters enclosed in brackets ([[[[ ]]]]) is a one-character RE that matches any of the characters in that set. EEEE....gggg...., "[[[[aaaakkkkmmmm]]]]" matches either an "aaaa", "kkkk", or "mmmm". A range of characters can be indicated with a dash. EEEE....gggg...., "[[[[aaaa----zzzz]]]]" matches any lower-case letter. However, if the first character of the set is the caret (^^^^), then the RE matches any character eeeexxxxcccceeeepppptttt those in the set. It does nnnnooootttt match the empty string. Example: [[[[^^^^aaaakkkkmmmm]]]] matches any character eeeexxxxcccceeeepppptttt "aaaa", "kkkk", or "mmmm". The caret loses its special meaning if it is not the first character of the set. The following rules can be used to build a multicharacter RE: Parentheses ((((( ))))) group parts of regular expressions together into subexpressions that can be treated as a single unit. For example, ((((hhhhaaaa))))++++ matches one or more "ha"'s. A one-character RE followed by an asterisk (****) matches zzzzeeeerrrroooo or more occurrences of the RE. Hence, [[[[aaaa----zzzz]]]]**** matches zero or more lower- PPPPaaaaggggeeee 1111 RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) case characters. A one-character RE followed by a plus (++++) matches oooonnnneeee or more occurrences of the RE. Hence, [[[[aaaa----zzzz]]]]++++ matches one or more lower-case characters. A question mark (????) is an optional element. The preceeding RE can occur zero or once in the string -- no more. EEEE....gggg.... xxxxyyyy????zzzz matches either xxxxyyyyzzzz or xxxxzzzz. The concatenation of REs is a RE that matches the corresponding concatenation of strings. EEEE....gggg...., [A-Z][a-z]* matches any capitalized word. The OR character ( |||| ) allows a choice between two regular expressions. For example, jjjjeeeellllllll((((yyyy||||iiiieeeessss)))) matches either "jelly" or "jellies". Braces ({{{{ }}}}) are reserved for future use. All or part of the regular expression can be "anchored" to either the beginning or end of the string being searched: If the caret (^^^^) is at the beginning of the (sub)expression, then the matched string must be at the beginning of the string being searched. If the dollar sign ($$$$) is at the end of the (sub)expression, then the matched string must be at the end of the string being searched. PPPPeeeerrrrssssiiiisssstttteeeennnncccceeee None EEEExxxxaaaammmmpppplllleeee #include <rw/re.h> #include <rw/cstring.h> #include <rw/rstream.h> main(){ RWCString aString("Hark! Hark! the lark"); // A regular expression matching any lowercase word or end of a //word starting with "l": RWCRExpr re("l[a-z]*"); cout << aString(re) << endl; // Prints "lark" } PPPPuuuubbbblllliiiicccc CCCCoooonnnnssssttttrrrruuuuccccttttoooorrrrssss RRRRWWWWCCCCRRRREEEExxxxpppprrrr(const char* pat); RRRRWWWWCCCCRRRREEEExxxxpppprrrr(const RWCString& pat); Construct a regular expression from the pattern given by ppppaaaatttt. The status PPPPaaaaggggeeee 2222 RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) of the results can be found by using member function ssssttttaaaattttuuuussss(((()))). RRRRWWWWCCCCRRRREEEExxxxpppprrrr(const RWCRExpr& r); Copy constructor. Uses value semantics -- self will be a copy of rrrr. RRRRWWWWCCCCRRRREEEExxxxpppprrrr(); Default constructor. You must assign a pattern to the regular expression before you use it. PPPPuuuubbbblllliiiicccc DDDDeeeessssttttrrrruuuuccccttttoooorrrr ~RRRRWWWWCCCCRRRREEEExxxxpppprrrr(); Destructor. Releases any allocated memory. AAAAssssssssiiiiggggnnnnmmmmeeeennnntttt OOOOppppeeeerrrraaaattttoooorrrrssss RWCRExpr& ooooppppeeeerrrraaaattttoooorrrr====(const RWCRExpr& r); Recompiles self to pattern found in rrrr. RWCRExpr& ooooppppeeeerrrraaaattttoooorrrr====(const char* pat); RWCRExpr& ooooppppeeeerrrraaaattttoooorrrr====(const RWCString& pat); Recompiles self to the pattern given by ppppaaaatttt. The status of the results can be found by using member function ssssttttaaaattttuuuussss(((()))). PPPPuuuubbbblllliiiicccc MMMMeeeemmmmbbbbeeeerrrr FFFFuuuunnnnccccttttiiiioooonnnnssss size_t iiiinnnnddddeeeexxxx(const RWCString& str, size_t* len = NULL, size_t start=0) const; Returns the index of the first instance in the string ssssttttrrrr that matches the regular expression compiled in self, or RRRRWWWW____NNNNPPPPOOOOSSSS if there is no such match. The search starts at index ssssttttaaaarrrrtttt. The length of the matching pattern is returned in the variable pointed to by lllleeeennnn. If an invalid regular expression is used for the search, an exception of type RRRRWWWWIIIInnnntttteeeerrrrnnnnaaaallllEEEErrrrrrrr will be thrown. Note that this member function is relatively clumsy to use -- class RRRRWWWWCCCCSSSSttttrrrriiiinnnngggg offers a better interface to regular expression searches. PPPPaaaaggggeeee 3333 RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) RRRRWWWWCCCCRRRREEEExxxxpppprrrr((((3333CCCC++++++++)))) statusType ssssttttaaaattttuuuussss() const; Returns the status of the regular expression: ssssttttaaaattttuuuussssTTTTyyyyppppeeee MMMMeeeeaaaannnniiiinnnngggg RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::OOOOKKKK No errors RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::NNNNOOOOTTTT____SSSSUUUUPPPPPPPPOOOORRRRTTTTEEEEDDDD POSIX.2 feature not yet supported. RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::NNNNOOOO____MMMMAAAATTTTCCCCHHHH Tried to find a match but failed RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::BBBBAAAADDDD____PPPPAAAATTTTTTTTEEEERRRRNNNN Pattern was illegal RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::BBBBAAAADDDD____CCCCOOOOLLLLLLLLAAAATTTTIIIINNNNGGGG____EEEELLLLEEEEMMMMEEEENNNNTTTT Invalid collating element referenced RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::BBBBAAAADDDD____CCCCHHHHAAAARRRR____CCCCLLLLAAAASSSSSSSS____TTTTYYYYPPPPEEEE Invalid character class type referenced RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::TTTTRRRRAAAAIIIILLLLIIIINNNNGGGG____BBBBAAAACCCCKKKKSSSSLLLLAAAASSSSHHHH Trailing in pattern RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::UUUUNNNNMMMMAAAATTTTCCCCHHHHEEEEDDDD____BBBBRRRRAAAACCCCKKKKEEEETTTT [] imbalance RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::UUUUNNNNMMMMAAAATTTTCCCCHHHHEEEEDDDD____PPPPAAAARRRREEEENNNNTTTTHHHHEEEESSSSIIIISSSS () imbalance RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::UUUUNNNNMMMMAAAATTTTCCCCHHHHEEEEDDDD____BBBBRRRRAAAACCCCEEEE {} imbalance RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::BBBBAAAADDDD____BBBBRRRRAAAACCCCEEEE Content of {} invalid. RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::BBBBAAAADDDD____CCCCHHHHAAAARRRR____RRRRAAAANNNNGGGGEEEE Invalid endpoint in [a-z] expression RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::OOOOUUUUTTTT____OOOOFFFF____MMMMEEEEMMMMOOOORRRRYYYY Out of memory RRRRWWWWCCCCRRRREEEExxxxpppprrrr::::::::BBBBAAAADDDD____RRRREEEEPPPPEEEEAAAATTTT ?,* or + not preceded by valid regular expression PPPPaaaaggggeeee 4444